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[57] ABSTRACT 

A system and method are provided for searching for desired 
items from a network of information resources. In particular, 
the system and method have advantageous applicability to 
searching for World Wide Web pages having desired con- 
tent. An initial set of pages are selected, preferably by 
running a conventional keyword-based query, and then 
further selecting pages pointing to, or pointed to from, the 
pages found by the keyword-based query. Alternatively, the 
invention may be applied to a single page, where the initial 
set includes pages pointed to by the single page and pages 
which point to the single page. Then, iteratively, authorita- 
tiveness values are computed for the pages of the initial set, 
based on the number of links to and from the pages. One or 
more communities, or "neighborhoods", of related pages are 
defined based on the authoritativeness values thus produced. 
Such communities of pages are likely to be of particular 
interest and value to the user who is interested in the 
keyword-based query or the single page, 
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METHOD AND SYSTEM FOR IDENTIFYING HTTP Hypertext transfer protocol: Hypertext transfer 

AUTHORITATIVE INFORMATION protocol. The character string "http:" at the beginning of a 

RESOURCES IN AN ENVIRONMENT WITH URL indicates that the document or file designated by the 

CONTENT-RASED LINKS BETWEEN URL contains hyperlinks defined according to the HTTP. 

INFORMATION RESOURCES 5 HyperText Markup Language (HTML): HTML is the 

language used by Web servers to create and connect docu- 

FIELD OF THE INVENTION ments that ^ c viewed by Web clients. HTML uses Hypertext 

Hie invention generally relates to the field of Information documents. Other uses of Hypertext documents are 

networks, communication, and information storage and in ^scribed in the following U.S. patents: 

retrieval. More specifically, the invention relates to the task 10 Ber ° st f m et a1 ' U S " Pat * No ' S' 204 ' 947 * issued Apr. 20, 

of searching for items, among a networked collection of lyyj, 

information resources, the items satisfying a desired crite- Ber ^? m e * al " U * S * Pat ' No * 5 ' 297 > 249 ' lssued Mar - 22 > 

rion. The invention has particular applicability to hypertext/ ' a 

hyperlinked environments such as the World Wide Web. 1C , u - s - Pat No - 5,355,472, issued Oct. 11, 1994; 

35 all of which are assigned to International Business Machines 

GLOSSARY OF TERMS USED Corporation, and which are referenced herein. 

While dictionary meanings are also implied by certain BACKGROUND OF THE INVENTION 

terms used here, the following glossary of some terms may In recent vears> the lechnology of multimedia storage and 

be useful. 20 interactive accessing has converged with that of network 

Graphical User Interface (GUI): A computer user inter- communications technologies, to present exciting prospects 
face characterized by the visual "desktop" paradigm, having for users who seek access to remotely stored multimedia 
images, windows, icons, and graphical menus representative information. Particularly exciting has been the recent promi- 
of data objects, functions, or application programs, and nence of the Internet and its progeny, the World Wide Web. 
utilizing a cursor, movable by a user input device such as a 25 The Internet and the Web have captured the public imagi- 
mouse, for selecting and manipulating the icons, etc., by nation as the so-called "information superhighway." Access- 
clicking on mouse input buttons; as distinct from a ing information through the Web has become known by the 
character- or text-oriented user interface. metaphorical term "surfing the Web." 

Internet ("the Net"): A connection system that links 3Q The Internet is not a single network, nor does it have any 

computers worldwide in a network. single owner or controller. Rather, the Internet is an unruly 

TCP/IP: Transmission Control Protocol/Internet Protocol. network of networks, a confederation of many different 

A packet switching scheme the Internet uses to chop, route, networks, public and private, big and small, whose human 

and reconstruct the data it handles, from e-mail to video. operators have agreed to connect to one another. 

World Wide Web (WWW, "the Web"): The Internet's 35 composite network represented by these networks 

multimedia application that lets people seeking information relies on 110 sm g le transmission medium. Bi-directional 

on the Internet switch from server to server and database to communication can occur via satellite links, fiber-optic 

database by clicking on highlighted words or phrases of trunk lines, phone lines, cable TV wires, and local radio 

interest. An Internet Web server supports clients and pro- nnks - However, no other communication medium is quite as 

vides information. 40 ubiquitous or easy to access as the telephone network. The 

Home page: A multimedia table of contents that guides a number of We f b ^ has exploded, largely due to the 

Web user to stored information on the Internet, convenience of accessing the Internet by coupling home 

, x L . L , computers, through modems, to the telephone network. As a 

Server: A machine (computer) which performs a task at „r *u= t . , l7 , , 

, _ , v „x , , consequence, many aspects or the Internet and the Web, such 

the command of another machine ( client ). In the context ag DetW ork communication architectures and protocols, have 

of the present invention, a server s primary faction is to evolved based me ^ that the communication 

fac.l.tate dislnbution of stored information over the Web. medium may be Qne of bandwidthi such as the 

Client: A machine which provides commands to a server, telephone network, 

and is serviced by the server Typically, a client machine is To tnis mt the Web has been ^ ifl ifldust domi _ 

operated by an end user, and functions responsive to user 5Q natdy as a means of commU nication, advertisement, and 

commands. placement of orders. The Web facilitates user access to 

Web Browser: A program running on a user-operated information resources by letting the user jump from one Web 

client computer. When a user "surfs" the Web using a page, or from one server, to another, simply by selecting a 

browser, the browser acts as an Internet tour guide, allowing highlighted word, picture or icon (a program object 

the client machine to display pictorial desktops, directories 55 representation) about which the user wants more informa- 

and search tools supported by the server. tion. The programming construct which makes this maneu- 

URL: Universal Resource Locator, a Web document ver- ver possible is known as a "hyperlink", 
sion of an e-mail address, in character string form, which i D order to explore the Web today, the user loads a special 
uniquely identifies a document, application, or tool available navigation program, called a "Web browser" onto his corn- 
over the Web. 60 puter. A browser is a program which is particularly tailored 

Hyperlink: A network addressing tool embedded in a for facilitating user requests for Web pages by implementing 

user-understandable displayed and/or highlighted item, such hyperlinks in a graphical environment. If a word or phrase, 

as a word, phrase, icon or picture. A URL can be accessed appearing on a Web page, is configured as an hyperlink to 

by means of its corresponding Hyperlink. When a user on a another Web page, the word or phrase is typically given in 

client machine selects the highlighted hyperlink through the 65 a color which contrasts with the surrounding text or 

user interface, the underlying item is then retrieved to the background, underlined, or otherwise highlighted, 

client supporting a Web browser. Accordingly, the word or phrase defines a region, on the 
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graphical representation of the Web page, inside of which a 
mouse click will activate the hyperlink, request a download 
of the linked-to page, and display the page when it is 
downloaded. 

There are a number of browsers presently in existence and 
in use. Common examples are the NetScape, Microsoft, 
Mosaic, and IBM 's Web Explorer browsers. Browsers allow 
a user of a client to access servers located throughout the 
world for information which is stored therein. The informa- 
tion is then provided to the client by the server by sending 
files or data packets to the requesting client from the server's 
storage resources. 

Part of the functionality of a browser is to provide image 
or video data. Web still image or video information can be 
provided, through a suitably designed Web page or interface, 
to a user on a client machine. Still images can also be used 
as Hypertext-type links, selectable by the user, for invoking 
other functions. For instance, a user may run a video clip by 
selecting a still image. 

A user of a Web browser who is researching a particular 
area of interest will often want to make a content-based 
search, over as many Web pages as practicable, to identify 
Web pages whose content relates to the area of interest. To 
meet this need, search engines have been developed, which 
execute key word -based searches to find Web pages whose 
content satisfies logical constraints given in terms of the 
keywords. Examples are Yahoo and AltaVista. 

To be effective, a search engine must effectively identify 
content, capturing relevant pages and discarding irrelevant 
pages. This effectiveness relies partly on the user's skill at 
crafting a keyword search command, and partly on the 
search engine's ability to avoid false hits and false misses. 
The latter factor is a function of the design of the search 
engine. 

Thus, an important design objective in an Internet/Web 
search engine is to facilitate the user's desire to find Web 
pages whose content matches what he/she desires. There is 
a significant need for systems and techniques which facili- 
tate higher quality search results. 

A number of current methods provide mechanisms for 
searching in such an environment. Most current methods in 
use perform searching by computing some type of similarity 
measure between the terms appearing in the user's query 
string and the words appearing in the set of pages. The pages 
that score highest under this similarity measure are then 
deemed to be the most relevant. 

In a hyper-text environment that is sufficiently large and 
unstructured, this approach has the following limitation. For 
queries that are sufficiently "general" in nature, a search 
based on term-matching can easily return several thousand 
pages that are highly "relevant" to the query, in the sense that 
they score highly under the term-based similarity measure. 
This results in a volume of output much greater than a 
human user can digest. 

There is a need, therefore, for techniques which allow a 
user to find, from among a large set of pages which are 
relevant in the sense of term matching, those fewer pages 
which can be of particular help to the user in his/her quest 
for desired information. 

Some conventional techniques have made use of pointers 
(e.g., hyperlinks) to and from an initial set of information 
items. See Kochtanek, "Document Clustering, Using Macro 
Retrieval Techniques," Journal of the American Society for 
Information Science, vol. 34, no. 5, September 1983, pp. 
356-359. However, there remains a need for further, more 
sophisticated techniques that produce better quality infor- 
mation for the user. 
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SUMMARY OF THE INVENTION 

It is therefore an object of the invention to provide a new 
strategy and technique for obtaining desired information in 
5 a hyperlink environment. 

It is a further object of the invention to find, from among 
a collection of information resources such as Web pages, 
where content-based links (e.g., hyperlinks) exist between 
different information resources, a set of information 

10 resources which satisfy a desired criterion. 

It is a further object of the invention to find, from among 
a collection of information resources such as Web pages, 
where content-based links (e.g., hyperlinks) exist between 
different information resources, a set of information 

15 resources which are, in a sense, "authoritative" as to a 
particular subject. 

It is a further object of the invention to find, from among 
a collection of information resources such as Web pages, 
where content-based links (e.g., hyperlinks) exist between 
20 different information resources, a set of information 
resources which are, in a sense, "authoritative" as to a 
particular subject, responsive to a query directed to that 
subject. 

To achieve these and other objects, the present invention 
25 is directed to a method and system for automatically iden- 
tifying the most authoritative pages from among a large set 
^of hyperlinked pages: (Note that the term "page" will be 
used for the sake of brevity, without limiting or detracting 
from the meaning denoted or implied by the broader term 
30 "information resources.") A user may use the invention if 
■ he/she has a page, whose content is of interest, and desires 
to find other pages which are authoritative as to that content 
of interest. 

Alternatively, the user might begin with a query, such as 

35 a keyword-based search strategy in a Web search engine, and 
retrieve a set of pages that satisfy that query. The invention 
is then utilized to find a set of pages which are authoritative 
as to the subject matter in the pages located. This set of 
pages produced by the invention may include a subset of the 

40 retrieved pages, as well as pages not retrieved but which are 
linked to pages that were retrieved. 

The method of the invention includes the following steps: 
r Eirst, an initial set of pages is obtained. The method may 

45 begin with a single page, where the content of that page is 
of interest, or with a group of pages, for instance produced 
as a result of a keyword-based query by a Web search 
engine. Because of the content-based links (e.g., hyperlinks) 
between the pages, there will be a certain number of 

50 additional pages linked to or from the single page, or group 
of pages. The initial set, then, includes the single page or 
group of pages, plus the linked pages. 

^Then, authoritativeness information is obtained for the 
pages 'of the initial set. The authoritativeness information 

55 exists on a per page basis, and is related to the number of 
links to or from the page. At first, the^ links are simply 
counted. In a preferred class of embodiments, however,, a 
sequence of iterations are performed, in which the authori- 
j tativeness information, in the form of scores such as numeri- 

60 cal scores, is produced, for each given page in each succes- 
sive iteration, by summing the scores, from the previous 
iteration, of pages linked to or from the given page. 
'Preferably, the scores are normalized after each iteration. It 
can be proven that the scores obtained in this fashion will 

65 converge. 

Finally, "neighborhoods" or "communities" of pages are 
obtained from the resultant authoritativeness information. A 
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single neighborhood may be obtained, or several distinct 
neighborhoods may be obtained by partitioning the scores 
into ranges. 

The basic concept of the invention has applicability in 
numerous areas. For instance, in a population of computer 5 
network users, links may take the form of E-mail messages 
from one user to another. The invention may be practiced to 
define communities of users, within the population, which 
communicate with each other a lot, or to identify individual 
users, which are authoritative sources of information, based 30 
on the number of E-mail messages they receive, and from 
whom they are received. 

Another area in which the invention may be applied is the 
area of telephone call records. As above, certain telephone 
subscribers may be considered authoritative by virtue of 15 
receiving a large number of calls from a distinct segment of 
the telephone subscriber population. These population 
segments, and the authoritative subscribers, may be identi- 
fied through use of the invention. 

20 

However, it is believed that the invention has particular 
applicability to the World Wide Web. A user, having interest 
in a particular area of subject matter and seeking Web pages 
related to that subject matter, may advantageously use the 
invention to locate authoritative pages on that subject matter. 

While the invention is primarily disclosed as a method, it 
will be understood by a person of ordinary skill in the art that 
an apparatus, such as a conventional data processor, includ- 
ing a CPU, memory, I/O, program storage, a connecting bus, 
and other appropriate components, could be programmed or 3Q 
otherwise designed to facilitate the practice of the method of 
the invention. Such a processor would include appropriate 
program means for executing the method of the invention. 

Also, an article of manufacture, such as a pre-recorded 
disk or other similar computer program product, for use with 35 
a data processing system, could include a storage medium 
and program means recorded thereon for directing the data 
processing system to facilitate the practice of the method of 
the invention. It will be understood that such apparatus and 
articles of manufacture also fall within the spirit and scope 40 
of the invention. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 is a flowchart illustrating a first preferred embodi- 
ment of the method of the invention. 45 

FIG. 2 is a flowchart showing, in more detail, a portion of 
the flowchart of FIG. 1. 

FIG. 3 is a schematic diagram of sets of pages for 
illustration of the steps executed by FIG. 2. 

FIGS. 4 and 5 are schematic diagrams illustrating steps of 50 
the flowchart of FIG. 1. 

FIG. 6 is a flowchart illustrating a second preferred 
embodiment of the method of the invention, 

FIG. 7 is a flowchart showing, in more detail, a portion of 
the flowchart of FIG. 6. 55 

DESCRIPTION OF THE PREFERRED 
EMBODIMENT 
GENERAL DISCUSSION OF "AUTHORITATIVENESS" 

In accordance with the invention, a searching method, or 60 
a search engine, goes beyond the notion of "relevance," 
judged in terms of satisfaction of a keyword search strategy, 
to the notion of "authoritativeness." 

As with the notion of relevance, the evaluation of authori- 
tativeness necessarily depends on human judgment. In other 65 
words, the notion of authoritativeness may be subjective, or 
may depend on a variety of factors. 



6 

However, in a large hyper- media environment, it is pos- 
sible to judge authoritativeness by making use of the judg- 
ment of users who have already created links in the envi- 
ronment. Thus it is our thesis here that, to a large extent, the 
authority, or authoritativeness, of a page can be determined 
by making use of the existing link structure of the environ- 
ment. Because the existing link structure is objectively 
observable, it is possible to automate the evaluation of 
authoritativeness. 

The method of the invention, in its preferred embodi- 
ments to be described below, is predicated on the notion that, 
in a hyper-media environment such as the World Wide Web, 
there are in fact two distinct types of authoritative pages that 
are of value to a user. Heretofore the terms "authority," 
"authoritative," etc., have been used generically. In the 
discussion which follows, the terms will be used more 
specifically to denote one of the two distinct types. (Overall, 
it is hoped that the context of a given portion of the 
discussion will make clear whether the term is being used in 
the generic sense, or in the narrower sense.) 

First, there are pages that will be termed "authorities." 
Authority pages correspond intuitively to "definitive refer- 
ences" on the query topic, and receive a large amount of user 
traffic. In hyperlinked environments, an authority is prefer- 
ably a page to which there are links in a large number of 
other pages on related subject matter. An authority pages is 
regarded as authoritative as to a topic in the sense that it is 
the destination of a large number of links related to that 
subject matter. 

Second, there are pages that will be termed "hubs." A hub 
is a page containing a large number of links to other pages. 
Hub pages correspond intuitively to the "hot-lists," book- 
mark collections, and other compendia of resources that 
have been compiled by individual creators of pages. A hub 
page is regarded as authoritative in the sense that a large 
number of other related pages can be found by following its 
links. 

Authorities and hubs have what could be termed a self- 
reinforcing relationship. That is, if H is a set of pages that are 
potential hubs, then the pages that they point to the most 
become candidate authorities. Conversely, if A is a set of 
pages that are potential authorities, then the set of pages that 
point to them the most become candidate hubs. 

The method of the invention makes use of this relation- 
ship between hubs and authorities. Preferred embodiments 
of the invention are implemented as a method which itera- 
tively follows links backward to hubs and forward to 
authorities, keeping authoritativeness information, prefer- 
ably in the form of scores which reflect the number of links. 

It is possible to obtain useful results by running only a 
single iteration. In that case, it is not really meaningful to 
call the single execution an "iteration." Rather, the method 
simple executes a sequence of steps once, and then is 
finished. 

Successive iterations, however, cause the scores for the 
various pages to converge to a set of authorities and hubs, 
including a neighborhood having maximal scores. It can be 
proved that, upon successive iterations, the scores converge 
to final values, and that the resulting equilibrium state 
satisfies a precise version of the above self-reinforcing 
property of authorities and hubs. See F. Narin et al., 
"Bibliometrics," Annual review of Information Science and 
Technology, White Plains, N.Y.: Knowledge Industry 
Publications, Inc., 1977, pp. 35-58. 

Note, by the way, that depending on the particular way in 
which scores are computed, they can have positive or 
negative values. The signs of the scores have no inherent 
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meaning, and are significant only to the extent that they . In step 20, a set R (shown as 22 in FIG. 3) is obtained, 

identify distinct neighborhoods. consisting of all pages 12 which point to the pages 12 in the 

Hie method can be directly extended to produce several, Mt p 10 ^ set R 22 is an initial set of pages which will 

relatively disjoint, communities of authonties and hubs. The K - * a* «u u » * .u * .u * ■ i 

method of the invention, when practiced in such a fashion, 5 be referred to as hubs meaning that they contain a large 

serves a clustering function. That is, the more disjoint these number of links to authoritative pages, 

communities are, the more they are capable of correspond- When such sets are constructed, since most of the pages 

ing to intuitive partitions of the query topic. The partitions 12 making up the sets will likely relate to the desired subject 

may be made according to various criteria including both matterj it will also likel be the case thal m the sets Q 

semantic distinctions and social "clustering among creators , D11 11u u *u ^ i. r i ^ 

of hyper-links 10 18 and R 22 will have links to each other. One such link 24 

FIRST EMBODIMENT— A SINGLE NEIGHBORHOOD * shown ' between P a g e s 12 in the sets Q 18 and R 22. This 

Preferred embodiments of the iterative algorithm of the exemplary link further illustrates the concept that a "hub" 

invention will now be described in more detail. The discus- page (the page 12 in the set R 22 which is the source of the 

sion of the first preferred embodiment will make use of link 24) has a link to an "authority" page (the page 12 in the 

FI St S ^ 1 7 5 J ' • , , c L . 15 set Q 18 which is the destination of the link 24). 

FIG. 1 depicts the basic iterative algorithm of the inven- 
tion. Initially, a set of query parameters may be chosen, as Therefore, the neighborhood consists of all pages that 
suitable for the particular situation. For instance, as per step eitDer P oint t0 a P a S e in the set P 10, or are pointed to by a 
2, a number m of initial pages, a number T of iterations, and page in the set P 10. In other words, the neighborhood 
an output size k may be designated. 20 includes the initial set of pages, an initial set of hub pages, 

One mode of operation for the algorithm is for a user to and an inilia] M of authority pages . neighborhood is 

have in mind a search strategy, such as a logical combination a^„ 0 a : n • n r .„ * n ui^ r A 

ri j i c • ,u i • i L - * „ ^™ , . denned m software, in a suitable manner for further 
of keywords, defining the desired subject matter, rhe basic . f ' . . , , . 
concepts of defining keyword search strategies are well Processing, preferably using conventional database tech- 
known, and will not be elaborated upon here. 25 mqu.es such as graphing or metadata (step 26). To limit the 
Alternately, the user may make use of any other method size of the neighborhood, one can optionally impose an 
for generating an initial set of hyper-linked pages. For upper bound on the allowed number of pages pointing to any 
instance, if the user knows of one page with subject matter single page in P. 

of interest, and seeks to find authoritative pages as to that ^ . , . _ „ 

subject matter, the initial set may be obtained merely by Optionally, a graph such as that shown in FIG. 3 may be 

finding other pages linked to or from that page. 30 constructed for display, to allow the user to see the state of 

In any event, the result will be an initial set P of pages. the query. 

Referring to FIG. 1, the initial set P of pages containing Returning to FIG. 1, an iterative process is set up for 

the query string * ^computed, as above (step 4). ^y suitable refini ^ ^ f aulhori altern ately 

method of identifying the set P may be used. A preferred « j- .u tUtlL ; 7 ! 6 . * L ^ 3 

method is via a standard term-matching algorithm. The set 35 ^ding other pages that the pages of the neighborhood are 

P may be specified as to size, through the use of a selected hnked t0 > and findin g other pages that are linked to the pages 

parameter m, as discussed in connection with step 2, of the neighborhood. The next several steps of FIG. 1 

In accordance with the invention, the hyperlinks between illustrate a preferred way of doing so. 

different pages are used to determine the authoritativeness of i n step 28, hub and authority vectors H and A are defined, 

pages which have been found. If a given search strategy 40 where each term of each of the vectors corresponds with one 

finds a number of related pages, many of them are likely to of the pages in the neighborhood. The iterative algorithm is 

have hyperlinks in common, either between each other or t0 operate on these vectofS 

to/from other pages in common. If the search also captures _ _ _ , , , . .. 

a "false hit" of unrelated subject matter, that unrelated page In the P«&™d embodiment, the initial values of H and 

will lack such hyperlinks in common. The invention takes 45 A are com P uled 35 follows, where u and v are pages in the 

advantage of this fact to establish the authoritativeness of a neighborhood: 

set of pages. The vector H is initialized as follows: 

Preferably, the initial set P is used to establish a "neigh- H|"vl=l if v belongs to P 
borhood" of common or interconnecting hyperlinks (step 6). 

In a preferred embodiment, the method (step 6) for 50 * v does not belon S t0 P 

constructing the neighborhood of the set P of pages is A[v]-0 for all pages v 

depicted in FIG. 2. The neighborhood itself is shown sche- The entries in these two vectors are now updated itera- 

matically in FIG. 3. FIGS. 2 and 3 will now be discussed, tively (step 30). One preferred method for performing this 

and afterward, the discussion of FIG. 1 will resume. updating is given in flowchart form in the next several steps 

Step 8 of FIG. 2 states that we begin with the set P of 55 of FIG. 1, and depicted graphically in FIGS. 4 and 5. 

pages produced by step 4. That set is shown collectively as if u and v are pages, let u-> v denote the presence of a link 

10 in FIG. 3. Individual pages 12 are shown schematically from u to v. Then the values of the terms of the hub and 

as small circles. Hyperlinks 14 are shown as arrows. A authority vectors H and A are updated as follows: 

hyperlink 14 is in a page 12 if the tail end of the arrow ™ 4 . , . , , 

touches that page, and the hyperlink 14 points to another 60 ™^2"? ^ n ah01 * *" n\ ? , ^^lyas steps 32 and 

page 12 touched by the head of the arrow 3 L 4 m nG ' ^ uatlon <}> * Crated '° FIG. 4, in which 

In step 16, a set Q (shown as 18 in FIG. 3) is obtained, three P a & es u1 ' ^ and ^ have links t0 a P a 8 e v * **** 

consisting of all pages 12 which are pointed to by the pages authority vector's term A[v] for the page v is the sum of the 

12 in the set P 10. The set Q 18 is an initial set of pages hub veclor values H t ul ]> H [ u2 ]' and "t" 3 ] for the thr ee 

which will be referred to as "authorities," meaning that a 65 P a § es ^ and xs ^- 

large number of other pages have links to the authority Similarly, Equation (2) is illustrated in FIG. 5, in which a 

pages. page v has links to 
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Initially, the implementation of FIG. 6 chooses the addi- 

H[ v ] «- y A[u] (1) tional mput P arameter q» a number of neighborhoods (i.e., of 

V *" " nub and authority vectors) to be found (step 42). In step 44, 

initial values for the terms of the hub and authority vectors 

a[v]^Vh[u] (2) 5 are set. 

.-v However, the initialization is preferably performed in a 

different manner from what was done in step 28 of FIG. 1. 

three pages ul, u2, and u3. The hub vector's term H[v] for ™ c /r * twboA ^ * 10 ^ ^f^f* 

the page* v is the sum of the authority vector values A[ul], "^borhoods. Consequently, it is necessary that the final 

A[u2], and A[u3] for the three pages ul, u2, and u3 10 [ csn{ } of the lterall0ns be multl P le vectors. In order 

It will be seen that, as these iterations are performed, the for the lteratl0ns t0 converge to multiple distinct vectors, it 

values of the terms of the hub and authority vectors will 155 neces sary that no two of the vectors become equal during 

increase. Accordingly, the vectors are preferably the course of the iterations. 

normalized, to prevent the numerical values from growing For tnis Purpose, the vectors are initialized so as to be 

too large (step 36). One preferred normalization method is 15 orthogonal. Moreover, following each iteration, they are 

the following: again updated so as to remain orthogonal. This updating step 

can be accomplished by the standard Gram-Schmidt 

A[v] (3) procedure, as given in G. Golub, C.F. Van Loan, "Matrix 

^ *~ I<M«] 2 Computations", Johns Hopkins University Press, 1989. 

u 20 In light of the foregoing, the preferred embodiment of the 

invention is as follows: Before the iterations begin, in step 

H[ V ] < — " (4) 46 the hub vectors are orthogonalized. The initial orthogo- 

£ W E U 1 2 nalization may conveniently be performed by assigning each 

coordinate a real-number value chosen uniformly at random 
25 from the interval [0,1]. 

Following the normalization of step 36, the iteration is The iterations are now performed (step 48). For a given 

complete. Further iterations may follow as appropriate, such iteration, the summing, similar to those given above in 

as by looping back T times, where T is the number of Equations (1) and (2), is done separately over each pair of 

iterations specified in step 2. hub and authority vectors (A,-, H t ). 

It will be seen that, as the successive iterations proceed, 30 At the end of each iteration, the vectors are modified to be 

the hub and authority vector values will increase based on mutually orthogonal. This can be accomplished by the 

the number of links common to the page populations. The standard Gram -Schmidt procedure given in G. Golub 

pages unrelated to the desired subject matter, which will (supra). 

have relatively few links to the pages related to the desired A preferred sequence of the steps of an iteration are given 

subject matter, will have relatively low values, and will, in 35 in FIG. 6, as follows: In step 50, the authority vectors are 

effect, be "weeded out." updated. When they are all updated, they are then orthogo- 

When the iterations have been completed, FIG. 1 con- nalized (step 52). Then, in step 54, the hub vectors are 

eludes by outputting its final results. A preferred output updated. When they are all updated, they are then orthogo- 

technique, given in steps 38 and 40, is to scan the hub and nalized (step 56). This completes an iteration. The iteration 

authority vectors H and A, to find the k largest terms, k 40 is repeated a desired number of times, 

having been specified in step 2, and being presumptively As with the embodiment of FIG. 1, the largest (positive) 

smaller than the number of pages identified. entries of Aq and H 0 are returned as the primary hubs and 

Note that steps 28-36 may be executed only a single time, authorities. One can then define 2q additional authority/hub 

and still obtain useful results. Depending on the particular communities, by taking the q most positive and the q most 

situation in which the invention is to be practiced, a user may 45 negative entries from each of the pairs of vectors (A,, H ( ), for 

choose to run only a single iteration and accept the results as i«l, . . . , q. 

satisfactory, to run a relatively small number of iterations, Note that the Gram -Schmidt procedure, which includes 

such as a fixed number or a number required to reach some subtractions, can produce negative values for vector terms, 

extrinsic limit such as a limit imposed by cost or other The positivity or negativity of the entries does not have a 

factors, or to run until convergence of the results is detected. 50 direct meaning in the context of the method. Rather, a more 

SECOND EMBODIMENT— SEVERAL NEIGHBOR- significant meaning is attributed to the magnitudes, i.e., 

HOODS absolute values, of the terms. In general, the more links to 

The above-described method may be extended to locate or from a page, or, more broadly, the greater the authorita- 

several communities of authorities and hubs. Iterations are tiveness of the page as to the desired subject matter, the 

performed in essentially the same manner as described 55 greater the magnitude of the value will be. 

above, but now, several vectors of each type are maintained. The noteworthy property of the entries, taken as a group, 

For instance, if there are to be q hub vectors and q authority is simply that they may be partitioned into two or more 

vectors, representing q number of distinct neighborhoods, communities, based on their ranges of values. It may be ' 

then the hub and authority vectors arc shown as distin- convenient or desirable, where one set is positive and the 

guished by index subscripts, as follows: Aq, . . . , A q and , 60 other set is negative, to partition at the zero value. However, 

H 0 , - . • , H^. it is not crucial that the partitions be evenly distributed or 

FIG. 6 is a flowchart, comparable to that of FIG. 1, symmetric. More generally, any subset of the communities 

showing a preferred embodiment of the invention where a can be returned, possibly according to additional criteria 

plurality of neighborhoods are to be found. Certain steps imposed by the user on the set of pages, 

which were shown in FIG. 1 have been omitted from FIG, 65 For discussion purposes, however, an example will be 

6 for the sake of brevity. It will be understood, however, that given in which partitioning is to be symmetric about the zero 

these omitted steps are to be included, as appropriate. point. 
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FIG. 7 gives a more detailed description of a preferred SUMMARY 
implementationof step 60of FIG. 6, subject to the constraint Using the foregoing specification, the invention may be 
of symmetry, just given. Where 2q communities are to be implemented using standard programming and/or engineer- 
identified, the method of the invention may conveniently be ing techniques using computer programming software, 
implemented to follow the symmetry. 5 firmware, hardware or any combination or subcombination 

The definition of the 2q communities proceeds as a series thereof. Any such resulting program(s), having computer 
of q iterations of a sequence of steps, where each iteration readable program code means, may be embodied or pro- 
produces two communities, one for pages with large coor- vided within one or more computer readable or usable media 
dinates and one for pages with small coordinates. Here, such as fixed (hard) drives, disk, diskettes, optical disks, 
"smallest" is taken to mean most negative, i.e., having the magnetic tape, semiconductor memories such as read-only 
largest negative magnitude. memory (ROM), etc., or any transmitting/receiving medium 

The iteration is expressed, in step 62, as a FOR loop to be suc h as the Internet or other communication network or link, 

executed q number of times. thereby making a computer program product, i.e., an article 

In step 64, a community, indexed as community 2i-l (for of manufacture, according to the invention. The article of 

l<i<q), is defined, by choosing k pages with largest coor- manufacture containing the computer programming code 

dinates in the vector H[i] as hubs (step 66), and choosing k " be ^ ^ mi[n ^ ^ Greedy 

pages with largest coordinates m the vector H[i] as hubs from Qne mediumj by mdc £ om one medium > 

Next, in step 70, a community, indexed as community 2/ an ° A ther medi « m >? r b y transmitting the code over a network, 

(for l<i<q), is defined, by choosing k pages with smallest An apparatus for making, using, or selling the invention 

coordinates in the vector H[i] as hubs (step 66), and choos- 20 ? av * one or mo i re Passing systems including, but not 

ing k pages with smallest coordinates in the vector H[i] as 1 J mUed t0 > a cenlra ! Pressing unit (CPU), memory, storage 

hubs (step 66) devices, communication links, communication devices, 

Tnis completes an iteration. In successive iterations, the 1/0 dcvices ' or an y subcomponents or individual 

pages with progressively smaller coordinates are used to partsof one or more processing systems, mcludmg software, 

define the hubs and authorities for odd-indexed 25 fi™w«c, ^ware | or i an > ' combination or subcombination 

communities, and the pages with progressively larger coor- ihtreof > wmch embod y the «™mtion as forth « ^ 

dinates are used to define the hubs and authorities for c alrns - 

even-indexed communities, until all 2q of the communities User ™P ut mav 06 received from ^ keyboard, mouse, 

have been denned. pen ' V0ice ' toucn screen, or any other means by which a 

MATHEMATICAL INTERPRETATION 30 numan ca n input data to a computer, including through other 

The discussion which follows will present a somewhat Programs such as application programs. 

more theoretical treatment of the concepts relating to hubs ° ne ****** * e «« ° f computer science will easily be 

and authorities, which have been discussed above. Certain able to combine the software created as described with 

aspects of the discussion may be foreseen from the existing appropriate general purpose or special purpose computer 

literature 35 narciware to create a computer system and/or computer 

The hub and authority vectors H and A correspond to the subcomponents embodying the invention and to create a 

principal eigenvectors of two matrices associated with the computer system and/or computer subcomponents for car- 

set of naees rying out the method of the invention. While the preferred 

Let M denote the matrix whose (ij) entry gives the embodiment of :the present invention has been illustrated in 

number of pages that point to both page i and page j. Let N 40 delal1 ' H should be apparent that modifications and adapta- 

denote the matrix whose (i j) entry gives the number of tlon _ s 10 thal embodiment may occur to one skilled in the art 

pages that are pointed to by both page i and page j. wlthout departing from the spirit or scope of the present 

Then the above iterative procedures are in fact an imple- mv ™??° f sct J! fol * m < he f° llowul § <- lams - . . 

mentation of the power iteration method, given in G. Golub, u W1 " le <he Preferred embodiments of the present invention 

C. F. Van Loan, "Matrix Computations", Johns Hopkins 45 hav j. j*™ ******* ln detail, it should be apparent that 

University Press, 1989, p. 351, for computing the principal modifications and adaptations to those embodiments may 

eigenvectors of the matrices M and N. occur l ° ° u ne skllled ! n the . art departing from the 

In particular, the authority vector A is the principal of the P resent invention as set forth in the following 

eigenvector of M, and the hub vector H is the principal c ,™ S * 

eigenvector of N. 50 What is claimed is: 

Hie additional vectors A, and H, correspond to non- . computer program product, for use with a computer 

principal eigenvectors of M and N respectively. The use of s y stem ' f ? r . directlD S the computer system to execute a 

such eigenvectors for clustering is known as spectral °J ^mation resources, the resources having 

partitioning, and has been studied as a graph algorithm. See, content-based links between each other, to identify a desired 

for instance, D. Spielman, S. Teng, "Spectral partitioning 55 ***t of the information resources which satisfy a desired 

works: Planar graphs and finite-element meshes," Proceed- criterion, the computer program product comprising: 

ings of the 37th IEEE Symposium on Foundations of a computer-readable medium; 

Computer Science, 1996. means, provided on the recording medium, for directing 

The entries of the matrices M and N correspond to the computer system to identify an initial set of infor- 

co-citation and bibliographic coupling, which have been 60 mation resources; 

studied in the bibliometric literature. means, provided on the recording medium, for directing 

Since the algorithm works on an arbitrary set of linked the computer system to define initial authoritativeness 

pages, it is worth noting that it can be run. in a query- information for the initial set; 

independent fashion. In particular, given a set of pages, the means, provided on the recording medium, for directing 

identification of hubs and authorities among them gives a 65 the computer system to use the initial authoritativeness 

method for automatically determining the topic that best information as input authoritativeness information, to 

"fits" the set of pages. execute the steps of: 
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(i) producing first authoritativeness information about a tiveness information includes means, provided on the 
set of information resources pointed to by links in recording medium, for directing the computer system to 
resources of the input set, and select an initial numerical authoritativeness value for each of 

(ii) producing second authoritativeness information the information resources of the initial set. 

about a set of information resources having links that 5 12. A computer program product as recited in claim 11, 

point to resources of the input set; and wherein the means for directing to define initial authorita- 

means, provided on the recording medium, for directing tiveness information further includes means, provided on the 

the computer system to produce a final set of informa- recording medium, for directing the computer system to 

tion resources based on the first and second authorita- define an authority value and a hub value for each of the 

tiveness information. 10 information resources of the initial set. 

2. A computer program product as recited in claim 1, 13. A computer program product as recited in claim 12, 
wherein the information resources include World Wide Web wherein the defined authority values and hub values are 
pages, and the content-based links include hyperlinks. processed as vectors, each vector containing a respective 

3. A computer program product as recited in claim 1, term corresponding with each respective one of the infor- 
wherein the means for directing to identify an initial set of 15 ma tion resources of the initial set, and having stored therein 
information resources includes means, provided on the the value defined for that respective one of the information 
recording medium, for directing the computer system to resources of the initial set. 

obtain, as an input, an information resource containing 14 a computer program product as recited in claim 12, 

subject matter of interest. wherein: 

4. A computer program product as recited in claim 3 20 an ^ hub yalue b defined M x tf ^ information 
wherein the means for directing to identify an initial set of resource was fQund b ^ k d . based 
^formation resources includes means, provided on the and „ tf me MomitioQ resource ^ 0 ' r 
recording medium, for directing the computer system to from me mformation resources which are the results of 
identify a further set of information resources linked to the tne gg^j,. and 

input information resource. 2 5 ■ . 

5. A computer program product as recited in claim 1, an mltlal >mhont X value 15 deflned "» 0 for aU ^formation 
wherein: resources. 

4 , c j- , 4 , r . . 15, A computer program product as recited in claim 12, 

the means for directing to execute the steps of producing « n i_ * • 
£ 4 j j ■» ** • r • wherein, for each iteration: 
first and second authoritativeness information is opera- 
tive in a series of iterations; 30 the hub value for an informatlon resource is updated as the 

the initial authoritativeness information is used as input sum of the aut u horit . v valu u es [°l aU i hority . informalion 
authoritativeness information for a first iteration; and re ^ urces whlch P omt t0 the hub ^fo^ation resource; 
the produced first and second authoritativeness infor- 
mation is a result of the iteration, the first and second the authority value for an information resource is updated 
authoritativeness information produced in a given itera- 35 ^ the sum of the nub values for hub information 
tion to be used as the input authoritativeness informa- resources which are pointed to by the information 
tion for the next iteration. resource. 

6. A computer program product as recited in claim 1 16 * A computer program product as recited in claim 15, 
further comprising means, provided on the recording wherein each iteration further includes normalizing the hub 
medium, for directing the computer system to execute the 40 and authority values for the information resources. 

steps of producing first authoritativeness information and 1T A computer program product as recited in claim 1, 

producing second authoritativeness information in a series wherein the means for directing to produce a final set of 

of iterations until a predetermined condition is met. information resources includes means, provided on the 

7. A computer program product as recited in claim 6, recording medium, for directing the computer system to 
wherein the predetermined condition includes the execution 45 information resources from the set based on their hub 
of a specified number of iterations. and authority values. 

8. A computer program product as recited in claim 6, 18 * A computer program product as recited in claim 17, 
wherein the predetermined condition includes a steady state wherein the means for directing to select includes means, 
in which further iterations result in substantially the same provided on the recording medium, for directing the corn- 
results, so puter system to select information resources whose hub 

9. A computer program product as recited in claim 6, values or authority values have greatest magnitudes, 
wherein the means for directing to identify an initial set of 19 A com puter program product as recited in claim 17, 
information resources includes means, provided on the wherein the means for directing to select includes means, 
recording medium, for directing the computer system to provided on the recording medium, for directing the corn- 
execute a keyword-based query search, results of the search 55 P uter system t0 select a plurality of successive communities, 
including information resources to be included in the initial selecting each successive community including selecting 
sel information resources whose hub values or authority values 

10. A computer program product as recited in claim 9, have greatest magnitudes of those information resources not 
wherein the means for directing to identify an initial set of already selected for a prior community. 

information resources further includes means, provided on 60 20 A method for executing a search of information 

the recording medium, for directing the computer system to resources, the resources having content-based links between 

identify information resources linked to or from the infer- each other > t0 identify a desired subset of the information 

mation resources which are the results of the search, the resources which satisfy a desired criterion, the method 

former information resources also to be included in the comprising the steps of: 

initial sel. 65 identifying an initial set of information resources; 

11. A computer program product as recited in claim 10, defining initial authoritativeness information for the ini- 
wherein the means for directing to define initial authorita- tial set; 
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using the initial authoritative ness information as input 
authoritativeness information, executing the steps of: 

(i) producing first authoritativeness information about a 
set of information resources pointed to by links in 
resources of the input set, and 5 

(ii) producing second authoritativeness information 
about a set of information resources having links that 
point to resources of the input set; and 

producing a final set of information resources based on 
the first and second authoritativeness information. 1Q 

21. A method as recited in claim 20, wherein the infor- 
mation resources include World Wide Web pages, and the 
content-based links include hyperlinks. 

22. A method as recited in claim 20, wherein the step of 
identifying an initial set of information resources includes 
obtaining, as an input, an information resource containing 15 
subject matter of interest. 

23. A method as recited in claim 22, wherein the step of 
identifying an initial set of information resources includes 
identifying a further set of information resources linked to 
the input information resource. 20 

24. A method as recited in claim 20, wherein: 

the step of executing the steps of producing first and 
second authoritativeness information is executed in a 
series of iterations; 

the initial authoritativeness information is used as input 25 
authoritativeness information for a first iteration; and 

the produced first and second authoritativeness informa- 
tion is a result of the iteration, the first and second 
authoritativeness information produced in a given itera- 
tion to be used as the input authoritativeness informa- 30 
tion for the next iteration. 

25. A method as recited in claim 20, wherein the steps of 
producing first authoritativeness information and producing 
second authoritativeness information are executed in a series 

of iterations until a predetermined condition is met. 35 

26. A method as recited in claim 25, wherein the prede- 
termined condition includes the execution of a specified 
number of iterations. 

27. A method as recited in claim 25, wherein the prede- 
termined condition includes a steady state in which further 40 
iterations result in substantially the same results. 

28. A method as recited in claim 25, wherein the step of 
identifying an initial set of information resources includes 
executing a keyword-based query search, results of the 
search including information resources to be included in the 45 
initial set. 

29. A method as recited in claim 28, wherein the step of 
identifying an initial set of information resources further 
includes identifying information resources linked to or from 
the information resources which are the results of the search, 50 
the former information resources also to be included in the 
initial set. 

30. A method as recited in claim 29, wherein the step of ' 
defining initial authoritativeness information includes 
selecting an initial numerical authoritativeness value for 55 
each of the information resources of the initial set. 

31. A method as recited in claim 30, wherein the step of 
defining initial authoritativeness information further 
includes defining an authority value and a hub value for each 

of the information resources of the initial set. 60 

32. A method as recited in claim 31, wherein the defined 
authority values and hub values are processed as vectors, 
each vector containing a respective term corresponding with 
each respective one of the information resources of the 
initial set, and having stored therein the value defined for 65 
that respective one of the information resources of the initial 
set. 
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33. A method as recited in claim 31, wherein: 

an initial hub value is defined as 1 if the information 
resource was found by the keyword-based query 
search, and 0 if the information resource is linked to or 
from the information resources which are the results of 
the search; and 

an initial authority value is defined as 0 for all information 
resources. 

34. A method as recited in claim 31, wherein, for each 
iteration: 

the hub value for an information resource is updated as the 
sum of the authority values for authority information 
resources which point to the hub information resource; 
and 

the authority value for an information resource is updated 
as the sum of the hub values for hub information 
resources which are pointed to by the information 
resource. 

35. A method as recited in claim 34, wherein each 
iteration further includes normalizing the hub and authority 
values for the information resources. 

36. A method as recited in claim 20, wherein: 

each information resource is associated with an authority 

value and a hub value; and 
the step of producing a final set of information resources 

includes selecting information resources from the set 

based on the hub and authority values. 

37. A method as recited in claim 36, wherein the step of 
selecting includes selecting information resources whose 
hub values or authority values have greatest magnitudes. 

38. A method as recited in claim 36, wherein the step of 
selecting includes selecting a plurality of successive 
communities, selecting each successive community includ- 
ing selecting information resources whose hub values or 
authority values have greatest magnitudes of those informa- 
tion resources not already selected for a prior community. 

39. A system for executing a search of information 
resources, the resources having content-based finks between 
each other, to identify a desired subset of the information 
resources which satisfy a desired criterion, the system com- 
prising: 

means for identifying an initial set of information 
resources; 

means for defining initial authoritativeness information 

for the initial set; 
means for using the initial authoritativeness information 

as input authoritativeness information, to execute the 

steps of: 

(i) producing first authoritativeness information about a 
set of information resources pointed to by links in 
resources of the input set, and 

(ii) producing second authoritativeness information 
about a set of information resources having links that 
point to resources of the input set; and 

means for producing a final set of information resources 
based on the first and second authoritativeness infor- 
mation. 

40. A system as recited in claim 39, wherein the infor- 
mation resources include World Wide Web pages, and the 
content-based links include hyperlinks. 

41. A system as recited in claim 39, wherein the means for 
identifying an initial set of information resources includes 
means for obtaining, as an input, an information resource 
containing subject matter of interest. 

42. A system as recited in claim 41, wherein the means for 
identifying an initial set of information resources includes 
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means for identifying a further set of information resources 
Linked to the input information resource. 

43. A system as recited in claim 39, wherein: 

the means for executing the steps of producing first and 
second authoritativeness information is operative in a 5 
series of iterations; 

the initial authoritativeness information is used as input 
authoritativeness information for a first iteration; and 

the produced first and second authoritativeness informa- aQ 
tion is a result of the iteration, the first and second 
authoritativeness information produced in a given itera- 
tion to be used as the input authoritativeness informa- 
tion for the next iteration. 

44. A system as recited in claim 39 further comprising 35 
means for executing the steps of producing first authorita- 
tiveness information and producing second authoritativeness 
information in a series of iterations until a predetermined 
condition is met. 

45. A system as recited in claim 44, wherein the prede- 2Q 
termined condition includes the execution of a specified 
number of iterations. 

46. A system as recited in claim 44, wherein the prede- 
termined condition includes a steady state in which farther 
iterations result in substantially the same results, 25 

47. A system as recited in claim 44, wherein the means for 
identifying an initial set of information resources includes 
means for executing a keyword-based query search, results 
of the search including information resources to be included 

in the initial set. 30 

48. A system as recited in claim 47, wherein the means for 
identifying an initial set of information resources further 
includes means for identifying information resources linked 
to or from the information resources which are the results of 
the search, the former information resources also to be 35 
included in the initial set. 

49. A system as recited in claim 48, wherein the means for 
defining initial authoritativeness information includes means 
for selecting an initial numerical authoritativeness value for 
each of the information resources of the initial set. 4C 

50. A system as recited in claim 49, wherein the means for 
defining initial authoritativeness information further 
includes means for defining an authority value and a hub 
value for each of the information resources of the initial set. 
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51. A system as recited in claim 50, wherein the defined 
authority values and hub values are processed as vectors, 
each vector containing a respective term corresponding with 
each respective one of the information resources of the 
initial set, and having stored therein the value defined for 
that respective one of the information resources of the initial 
set. 

52. A system as recited in claim 50, wherein: 

an initial hub value is defined as 1 if the information 
resource was found by the keyword-based query 
search, and 0 if the information resource is linked to or 
from the information resources which are the results of 
the search; and 

an initial authority value is defined as 0 for all information 
resources. 

53. A system as recited in claim 50, wherein, for each 
iteration: 

the hub value for an information resource is updated as the 
sum of the authority values for authority information 
resources which point to the hub information resource; 
and 

the authority value for an information resource is updated 
as the sum of the hub values for hub information 
resources which are pointed to by the information 
resource. 

54. A system as recited in claim 53, wherein each iteration 
further includes normalizing the hub and authority values for 
the information resources. 

55. A system as recited in claim 39, wherein the means for 
producing a final set of information resources includes 
means for selecting information resources from the set based 
on their hub and authority values. 

56. A system as recited in claim 55, wherein the means for 
selecting includes means for selecting information resources 
whose hub values or authority values have greatest magni- 
tudes. 

57. A system as recited in claim 55, wherein the means for 
selecting includes means for selecting a plurality of succes- 
sive communities, selecting each successive community 
including selecting information resources whose hub values 
or authority values have greatest magnitudes of those infor- 
mation resources not already selected for a prior community. 

* * * * * 
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